\chapter{Sampling Methods}\label{uq:sampling}

This chapter introduces several fundamental concepts related to
sampling methods. In particular, the statistical properties of the
Monte Carlo estimator are discussed
(\S\ref{uq:sampling:montecarlo}) and strategies for multilevel and
multifidelity sampling are introduced within this context. Hereafter,
multilevel refers to the possibility of exploiting distinct
discretization levels (i.e. space/time resolution) within a single
model form, whereas multifidelity involves the use of more than one
model form.  In \S\ref{uq:sampling:controlvariate}, we describe the
control variate Monte Carlo algorithm that we align with multifidelity
sampling, and in \S\ref{uq:sampling:multilevel}, we describe the
multilevel Monte Carlo algorithm that we align with multilevel
sampling.  In \S\ref{uq:sampling:mlmf}, we show that these two
approaches can be combined to create multilevel-multifidelity sampling
approaches.

\section{Monte Carlo (MC)} \label{uq:sampling:montecarlo}
Monte Carlo is a popular algorithm for stochastic simulations due to its simplicity, flexibility, and the provably convergent behavior that is independent 
of the number of input uncertainties. A quantity of interest $Q: \Xi \rightarrow \mathbb{R}$, represented as a random variable (RV), 
can be introduced as a function of a random vector $\boldsymbol{\xi} \in \Xi \subset \mathbb{R}^d$. The goal of any MC simulation is computing 
statistics for $Q$, e.g. the expected value $\mathbb{E}\left[Q\right]$. The MC estimator $\hat{Q}_N^{MC}$ for 
$\mathbb{E}\left[Q\right]$ is defined as follows
\begin{equation}
\hat{Q}_N^{MC} = \dfrac{1}{N} \sum_{i=1}^N Q^{(i)},
\end{equation}
where $Q^{(i)} = Q(\boldsymbol{\xi}^{(i)})$ and $N$ is used to indicate the number of realizations of the model. 

For large $N$ it is trivial to demonstrate that the MC estimator is unbiased, i.e., its bias is zero and its convergence
to the true statistics is $\mathcal{O}(N^{-1/2})$. Moreover, since each sequence of realizations is different, another crucial property 
of any estimator is its own variance:
\begin{equation}\label{EQ: variance MC}
 \mathbb{V}ar\left( \hat{Q}_N^{MC} \right)  = \dfrac{1}{N^2} \sum_{i=1}^{N} \mathbb{V}ar\left( Q \right) 
            = \dfrac{\mathbb{V}ar\left(Q\right) }{N}.
\end{equation}
Furthermore, it is possible to show, in the limit $N \rightarrow \infty$, that the error $\left( \mathbb{E}\left[Q\right] - \hat{Q}_N^{MC} \right) \sim 
\sqrt{\dfrac{\mathbb{V}ar\left(Q\right) }{N}} \mathcal{N}(0,1)$, where $\mathcal{N}(0,1)$ represents a standard normal RV. As a consequence, it is possible to define a confidence
interval for the MC estimator which has an amplitude proportional to the standard deviation of the estimator.
Indeed, the variance of the estimator plays a fundamental role in the quality of the
numerical results: the reduction of the variance (for a fixed computational cost) 
is a very effective way of improving the quality of the MC prediction. 

\section{Control variate Monte Carlo} \label{uq:sampling:controlvariate}
A closer inspection of Eq.~\eqref{EQ: variance MC} indicates that only an increase in the number of simulations $N$ 
might reduce the overall variance, since $\mathbb{V}ar\left({Q}\right)$ is an intrinsic property of the model under analysis. However, 
more sophisticated  techniques have been proposed to accelerate the convergence of a MC simulation. For instance, an incomplete list 
of these techniques can include stratified sampling, importance sampling, Latin hypercube, deterministic Sobol' 
sequences and control variates. In particular, the control variate approach, is based on the idea of replacing
the RV $Q$ with a different one which has the same expected value, but with a smaller variance.
The goal is to reduce the numerator in Eq.~\eqref{EQ: variance MC}, and hence the value of the estimator variance without
requiring a larger number of simulations.
In a practical setting, the control variate makes use of an auxiliary functional $G=G(\boldsymbol{\xi})$ for which the 
expected value $\mathbb{E}\left[G\right]$ is known. Indeed, the alternative estimator can be defined as
\begin{equation} \label{EQ: control variate}
\hat{Q}_N^{MCCV} =  \hat{Q}_N^{MC} - \beta \left( \hat{G}_N^{MC} - \mathbb{E}\left[G\right] \right).
\end{equation}
The MC control variate estimator $\hat{Q}_N^{MCCV}$ is unbiased (irrespective of the value of the parameter 
$\beta \in \mathbb{R}$), but its variance now has a more complex dependence not only 
on the $\mathbb{V}ar\left({Q}\right)$, but also on $\mathbb{V}ar\left(G\right)$ and the covariance between $Q$ and $G$ since
\begin{equation}
 \mathbb{V}ar\left(\hat{Q}_N^{MCCV}\right) = \dfrac{1}{N} \left( \mathbb{V}ar\left( \hat{Q}_N^{MC} \right) + \beta^2 \mathbb{V}ar\left( \hat{G}_N^{MC} \right) - 2\beta \mathrm{Cov}\left(Q,G\right) \right).
\end{equation}
The parameter $\beta$ can be used to minimize the overall variance leading to 
\begin{equation}
\beta = \dfrac{ \mathrm{Cov}\left(Q,G\right) }{ \mathbb{V}ar\left( G \right) }, 
\end{equation}
for which the estimator variance follows as
\begin{equation}
 \mathbb{V}ar\left({\hat{Q}_N^{MCCV}}\right) = \mathbb{V}ar\left({\hat{Q}_N^{MC}}\right)\left( 1-\rho^2 \right).
\end{equation}
Therefore, the overall variance of the estimator $\hat{Q}_N^{MCCV}$ is proportional to the variance of the standard
MC estimator $\hat{Q}_N^{MC}$ through a factor $1-\rho^2$ where $\rho = \dfrac{ \mathrm{Cov}\left(Q,G\right) }{\sqrt{\mathbb{V}ar\left(Q\right)\mathbb{V}ar\left(G\right)}}$ is the Pearson
correlation coefficient between $Q$ and $G$. Since $0<\rho^2<1$, the variance $\mathbb{V}ar\left( \hat{Q}_N^{MCCV} \right)$ is always less than
the corresponding $\mathbb{V}ar\left({\hat{Q}_N^{MC}}\right)$. The control variate technique can be seen as a very general approach
to accelerate a MC simulation. The main step is to define a convenient control variate function which is cheap to evaluate and well correlated 
to the target function. For instance, function evaluations obtained through a different (coarse) resolution may be employed or even coming 
from a more crude physical/engineering approximation of the problem. A viable way of building a well correlated control variate is to 
rely on a low-fidelity model (i.e. a crude approximation of the model of interest) to estimate the control variate using estimated 
control means (see \cite{Pasupathy2014}). This technique has been introduced in the context of optimization under uncertainty in \cite{Ng2014}.

\section{Multilevel Monte Carlo} \label{uq:sampling:multilevel}
In general engineering applications, the quantity of interest $Q$
%, introduced in the previous section, 
is obtained as the result of the numerical solution of a partial partial differential equation (possibly a system of them).  Therefore, the dependence on the 
physical $\mathbf{x} \in \Omega\subset\mathbb{R}^n$ and/or temporal $t \in T\subset\mathbb{R^+}$ coordinates should be included, 
hence $Q=Q(\mathbf{x}, \boldsymbol{\xi}, t)$. A finite spatial/temporal resolution is always employed to numerically solve a PDE, implying
the presence of a discretization error in addition to the stochastic error. The term discretization is applied generically with reference 
to either the spatial tessellation, the temporal resolution, or both (commonly, they are linked). For a generic tessellation with $M$ 
degrees-of-freedom (DOFs), the PDE solution of $Q$ is referred to as $Q_M$. Since $Q_M \rightarrow Q$ for $M\rightarrow\infty$, 
then $\mathbb{E}\left[{Q_M}\right] \rightarrow \mathbb{E}\left[{Q}\right]$ for $M\rightarrow\infty$ with a prescribed order of convergence. 
A MC estimator in presence of a finite spatial resolution and finite sampling is
\begin{equation}
\hat{Q}^{MC}_{M,N} = \frac{1}{N} \sum_{i=1}^N Q_M^{(i)}
\end{equation}
for which the mean square error (MSE) is
\begin{equation}
\mathbb{E}\left[ (\hat{Q}^{MC}_{M,N}-\mathbb{E}\left[ Q \right] )^2 \right]
       = N^{-1} \mathbb{V}ar\left({Q_M}\right) + \left( \mathbb{E}\left[{ Q_M-Q }\right] \right)^2,
\end{equation}
where the first term represents the variance of the estimator, and the second term $\left( \mathbb{E}\left[ Q_M-Q \right] \right)^2$ reflects the bias
introduced by the (finite) spatial discretization. The two contributions appear to be independent of each other; accurate MC estimates can only be obtained by drawing the required $N$ number of simulations of $Q_M( \boldsymbol{\xi} )$ at 
a sufficiently fine resolution $M$. Since the numerical cost of a PDE is related to the number of DOFs of the tessellation,
the total cost of a MC simulation for a PDE can easily become intractable for complex multi-physics applications that are computationally 
intensive.

\subsection{Multilevel Monte Carlo for the mean}

The multilevel Monte Carlo (MLMC) algorithm has been introduced, starting from the control variate idea, for situation in which additional 
%(with respect to the stochastic space) 
discretization levels can be defined. The basic idea, borrowed from the multigrid approach,
is to replace the evaluation of the statistics of $Q_M$ with a sequence of evaluations at coarser levels. If it is 
possible to define a sequence of discretization levels $\left\{ M_\ell: \ell = 0, \dots, L \right\}$ with 
$M_0 < M_1 < \dots < M_L \stackrel{\mathrm{def}}{=} M$, the expected value $\mathbb{E}\left[{Q_M}\right]$ can be decomposed, 
exploiting the linearity of the expected value operator as
\begin{equation}
 \mathbb{E}\left[{Q_{M}}\right] = \mathbb{E}\left[{Q_{M_0}}\right] + \sum_{\ell = 1}^L \mathbb{E }\left[ Q_{M_{\ell}} - Q_{M_{\ell-1}} \right].
\end{equation}
If the difference function $Y_\ell$ is defined according to
\begin{equation}
 Y_\ell = \left\{
 \begin{split}
 Q_{M_0} \quad &\mathrm{if} \quad \ell=0 \\
 Q_{M_{\ell}} - Q_{M_{\ell-1}} \quad &\mathrm{if} \quad 0<\ell\leq L,
 \end{split}
 \right.
\end{equation}
the expected value $\mathbb{E}\left[{Q_M}\right]=\sum_{\ell=0}^{L}{  \mathbb{E}\left[Y_\ell\right]   }$. A multilevel MC estimator is obtained 
when a MC estimator is adopted independently for the evaluation of the expected value of $Y_\ell$ on each level. The resulting multilevel 
estimator $\hat{Q}_M^{\mathrm{ML}}$ is 
\begin{equation}
 \hat{Q}_M^{\mathrm{ML}} = \, \sum_{\ell = 0}^L \hat{Y}_{\ell, N_\ell}^{\mathrm{MC}} 
 = \sum_{\ell = 0}^L \frac{1}{N_\ell} \sum_{i=1}^{N_\ell} Y_\ell^{(i)}.
\end{equation}
Since the multilevel estimator is unbiased, the advantage of using this formulation is in its reduced estimator variance 
$\sum_{\ell=0}^{L} N_\ell^{-1} \mathbb{V}ar\left({Y_\ell}\right)$: since $Q_M \rightarrow Q$, 
the difference function $Y_\ell \rightarrow 0$ as the level $\ell$ increases. Indeed, the corresponding number of
samples $N_\ell$ required to resolve the variance associated with the $\ell$th level is expected 
to decrease with $\ell$.
 
The MLMC algorithm can be interpreted as a strategy to optimally allocate 
resources. If the total cost of the MLMC algorithm is written as  
\begin{equation}\label{EQ: MLMC cost}
\mathcal{C}(\hat{Q}^{ML}_{M}) = \sum_{\ell=0}^{L} N_\ell \, \mathcal{C}_{\ell},
\end{equation}
with $\mathcal{C}_{\ell}$ being the cost of the evaluation of $Y_\ell$ (involving either one or two discretization evaluations), then the following constrained minimization problem can be formulated where an equality constraint enforces a stochastic error (from MLMC estimator variance) equal to the residual bias error ($\varepsilon^2/2$)
\begin{equation}\label{EQ:mlmc_optimization}
 f(N_\ell,\lambda) = \sum_{\ell=0}^{L} N_\ell \, \mathcal{C}_{\ell} 
                   + \lambda \left( \sum_{\ell=0}^{L} N_\ell^{-1} \mathbb{V}ar\left({Y_\ell}\right) - \varepsilon^2/2 \right). 
\end{equation}
using a Lagrange multiplier $\lambda$.  This equality constraint reflects a balance between the two contributions to MSE, reflecting the goal to not over-resolve one or the other.  The result of the minimization is
\begin{equation}\label{EQ: MLMC nl}
N_{\ell} = \frac{2}{\varepsilon^2} \left[ \, \sum_{k=0}^L \left( \mathbb{V}ar\left(Y_k\right) \mathcal{C}_k \right)^{1/2} \right] 
               \sqrt{\frac{ \mathbb{V}ar\left({Y_\ell}\right) }{\mathcal{C}_{\ell}}},
\end{equation}
defining an optimal sample allocation per discretization level.

\subsection{MLMC extension to the variance}
Despite the original introduction of the MLMC approach for the computation of the mean estimator in \cite{Giles2008,Giles2015}, it is 
possible to estimate higher-order moments with a MLMC sampling strategy, as for instance the variance.

A single level unbiased estimator for the variance of a generic QoI at the highest level $M_L$ of the hierarchy can be written as
\begin{equation}
\label{eq: variance_est_single_level}
 \mathbb{V}ar\left[Q_{M_L}\right] \approx \frac{1}{N_{M_L} - 1} \sum_{i=1}^{N_{M_L}} \left( Q_{M_L}^{(i)} - \mathbb{E}\left[Q_L\right] \right)^2.
\end{equation}

The multilevel version of Eq.~\eqref{eq: variance_est_single_level} can be obtained via a telescopic expansion in term of difference of estimators over subsequent levels. To simplify the notation and for simplicity of exposure from now on 
we only indicate the level, \textit{i.e.} $M_\ell = \ell$. 

The expansion is obtained by re-writing Eq.~\eqref{eq: variance_est_single_level} as 
\begin{equation}
\begin{split}
\label{eq: variance_est_ML}
 \mathbb{V}ar\left[Q_L\right] &\approx       \frac{1}{N_L - 1} \sum_{i=1}^{N_L} \left( Q_L^{(i)} - \mathbb{E}\left[Q_L\right] \right)^2 \\
                              &\approx \sum_{\ell=0}^L  \frac{1}{N_\ell - 1} \left( \left( Q_{\ell}^{(i)} - \mathbb{E}\left[Q_{\ell}\right] \right)^2 
                                                                                  - \left( Q_{{\ell-1}}^{(i)} - \mathbb{E}\left[Q_{\ell-1}\right] \right)^2 \right).
\end{split}
\end{equation}
It is important here to note that since the estimators at the levels $\ell$ and $\ell-1$ are computed with the same number of samples both estimators use the factor 
$1/(N_\ell-1)$ to obtain their unbiased version. Moreover, each estimator is indeed written with respect to its own mean value, \textit{i.e.} the mean value on its level,
either $\ell$ or $\ell-1$. This last requirement leads to the computation of a local expected value estimator with respect to the same samples employed for the difference estimator. If we now denote with $\hat{Q}_{\ell,2}$ the sampling estimator for the second order moment of the QoI $Q_\ell$ we can write
\begin{equation}
\begin{split}
\label{eq: variance_est_ML_approximation}
 \mathbb{V}ar\left[Q_L\right] \approx \hat{Q}_{L,2}^{\mathrm{ML}} = \sum_{\ell=0}^L \hat{Q}_{\ell,2} - \hat{Q}_{\ell-1,2},
\end{split}
\end{equation}

where 
\begin{equation}
\label{eq: variance_est_ML_level_terms}
 \hat{Q}_{\ell,2} = \frac{1}{N_\ell - 1} \sum_{i=1}^{N_\ell} \left( Q_\ell^{(i)} - \hat{Q}_\ell \right)^2
\text{\quad  and \quad}
 \hat{Q}_{\ell - 1,2} = \frac{1}{N_\ell - 1} \sum_{i=1}^{N_\ell} \left( Q_{\ell - 1}^{(i)} - \hat{Q}_{\ell - 1} \right)^2.
\end{equation}
Note that $\hat{Q}_{\ell,2}$ and $\hat{Q}_{\ell - 1,2}$ are explicitly sharing the same samples $N_\ell$.

For this estimator we are interested in minimizing its cost while also prescribing its variance as done for the expected value. This is accomplished by evaluating the 
variance of the multilevel variance estimator $\hat{Q}_{L,2}^{ML}$
\begin{equation}
 \mathbb{V}ar\left[ \hat{Q}_{L,2}^{\mathrm{ML}} \right] = \sum_{\ell=0}^L \mathbb{V}ar\left[ \hat{Q}_{\ell,2} - \hat{Q}_{\ell-1,2} \right]
                                               = \sum_{\ell=0}^L \mathbb{V}ar\left[ \hat{Q}_{\ell,2} \right] + \mathbb{V}ar\left[\hat{Q}_{\ell-1,2} \right]
                                               - 2 \mathbb{C}ov\left( \hat{Q}_{\ell,2},\hat{Q}_{\ell-1,2} \right),
\end{equation}
where the covariance term is a result of the dependence described in~\eqref{eq: variance_est_ML_level_terms}.

The previous expression can be evaluated once the variance for the sample estimator of the second order order moment $\mathbb{V}ar\left[ \hat{Q}_{\ell,2} \right]$ and the covariance term $\mathbb{C}ov\left( \hat{Q}_{\ell,2},\hat{Q}_{\ell-1,2} \right)$ are known. These terms can be evaluated as:
\begin{equation}
 \mathbb{V}ar\left[ \hat{Q}_{\ell,2} \right] \approx \frac{1}{N_\ell} \left( \hat{Q}_{\ell,4} - \frac{N_\ell-3}{N_\ell-1} \left(\hat{Q}_{\ell,2}\right)^2 \right),
\end{equation}
where $\hat{Q}_{\ell,4}$ denotes the sampling estimator for the fourth order central moment.

The expression for the covariance term is more involved and can be written as
\begin{equation}
\begin{split}
 \mathbb{C}ov\left( \hat{Q}_{\ell,2},\hat{Q}_{\ell-1,2} \right) &\approx \frac{1}{N_\ell} \mathbb{E}\left[ \hat{Q}_{\ell,2},\hat{Q}_{\ell-1,2} \right] \\
                                                                      &+ \frac{1}{N_\ell N_{\ell-1}} \left( \mathbb{E}\left[ Q_\ell Q_{\ell-1} \right]^2
                                                                      - 2  \mathbb{E}\left[ Q_\ell Q_{\ell-1} \right] \mathbb{E}\left[ Q_\ell \right] \mathbb{E}\left[Q_{\ell-1} \right] + \left( \mathbb{E}\left[ Q_\ell \right] \mathbb{E}\left[Q_{\ell-1} \right] \right)^2
                                                                      \right).
\end{split}
\end{equation}

The first term of the previous expression is evaluated by estimating and combining several sampling moments as
\begin{equation}
\begin{split}
 \mathbb{E}\left[ \hat{Q}_{\ell,2},\hat{Q}_{\ell-1,2} \right] &= \frac{1}{N_\ell} \left( \mathbb{E}\left[ Q_\ell^2 Q_{\ell-1}^2 \right] \right) - \mathbb{E}\left[ Q_\ell^2 \right] \mathbb{E}\left[Q_{\ell-1}^2 \right] - 2 \mathbb{E}\left[Q_{\ell-1} \right] \mathbb{E}\left[ Q_{\ell}^2 Q_{\ell-1} \right] \\
                                      &+ 2 \mathbb{E}\left[Q_{\ell-1}^2 \right] \mathbb{E}\left[ Q_{\ell}^2 \right]
                                      - 2  \mathbb{E}\left[ Q_{\ell} \right] \mathbb{E}\left[ Q_{\ell} Q_{\ell-1}^2 \right]
                                      + 2 \mathbb{E}\left[ Q_{\ell} \right]^2 \mathbb{E}\left[ Q_{\ell-1}^2 \right] \\
                                      &+ 4 \mathbb{E}\left[ Q_{\ell} \right] \mathbb{E}\left[ Q_{\ell-1} \right] \mathbb{E}\left[ Q_{\ell} Q_{\ell-1} \right]
                                      - 4 \mathbb{E}\left[ Q_{\ell} \right]^2 \mathbb{E}\left[ Q_{\ell-1} \right]^2.
\end{split}
\end{equation}
It is important to note here that the previous expression can be computed only if several sampling estimators for product of the QoIs at levels $\ell$ and $\ell-1$ are available.
These quantities are not required in the standard MLMC implementation for the mean and therefore for the estimation of the variance more data need to be stored to assemble the
quantities on each level.

An optimization problem, similar to the one formulated for the mean in the previous section, can be written in the case of variance 
\begin{equation}\label{EQ:mlmc_optimization_var}
\begin{split}
\min\limits_{N_\ell} \sum_{\ell=0}^L \mathcal{C}_{\ell} N_\ell \quad \mathrm{s.t.} \quad \mathbb{V}ar\left[ \hat{Q}_{L,2}^{\mathrm{ML}} \right] = \varepsilon^2/2.
% 
% 
%  f(N_\ell,\lambda) = \sum_{\ell=0}^{L} N_\ell \, \mathcal{C}_{\ell} 
%                    + \lambda \left( \sum_{\ell=0}^{L} N_\ell^{-1} \mathbb{V}ar\left({Y_\ell}\right) - \varepsilon^2/2 \right). 
\end{split}
\end{equation}

This optimization problem can be solved in two different ways, namely an analytical approximation and by solving a non-linear optimization problem. The analytical approximation follows the approach described in \cite{Pisaroni2017} and introduces a helper variable
\begin{equation}
\hat{V}_{2, \ell} := \mathbb{V}ar\left[ \hat{Q}_{\ell,2} \right] \cdot N_{\ell}.
\end{equation}
Next, the following constrained minimization problem is formulated
\begin{equation}\label{EQ:mlmc_var_optimization_nobile}
 f(N_\ell,\lambda) = \sum_{\ell=0}^{L} N_\ell \, \mathcal{C}_{\ell} 
                   + \lambda \left( \sum_{\ell=0}^{L} N_\ell^{-1} \hat{V}_{2, \ell} - \varepsilon^2/2 \right),
\end{equation}
and a closed form solution is obtained
\begin{equation}\label{EQ: MLMC_nl_var_nobile}
N_{\ell} = \frac{2}{\varepsilon^2} \left[ \, \sum_{k=0}^L \left( \hat{V}_{2, k} \mathcal{C}_k \right)^{1/2} \right] 
               \sqrt{\frac{ \hat{V}_{2, \ell} }{\mathcal{C}_{\ell}}},
\end{equation}
similarly as for the expected value in~\eqref{EQ:mlmc_optimization}.
%Note here, that higher order terms of $N_\ell$ in $\mathbb{V}ar\left[ \hat{Q}_{\ell,2} \right]$ are not considered in the optimization. 

The second approach uses numerical optimization directly on the non-linear optimization problem~\eqref{EQ:mlmc_optimization_var} to find an optimal sample allocation. Dakota uses OPTPP as the default optimizer and switches to NPSOL if it is available. 

Both approaches for finding the optimal sample allocation when allocating for the variance are currently implemented in Dakota. The analytical solution is employed by default while the optimization is enabled using a keyword. We refer to the reference manual for a discussion of the keywords to select these different options.


\subsection{MLMC extension to the standard deviation}
The extension of MLMC for the standard deviation is slightly more complicated by the presence of the square root, which prevents a straightforward expansion over levels. 

One possible way of obtaining a biased estimator for the standard deviation is
\begin{equation}
\hat{\sigma}_L^{ML} = \sqrt{ \sum_{\ell=0}^L \hat{Q}_{\ell,2} - \hat{Q}_{\ell - 1,2} }.
\end{equation}

To estimate the variance of the standard deviation estimator, it is possible to leverage the result, derived in the previous section for the variance, and write the variance of the standard deviation as a function of the variance and its estimator variance. If we can estimate the variance $\hat{Q}_{L,2}$ and its estimator variance $\mathbb{V}ar\left[ \hat{Q}_{L,2} \right]$, the variance for the standard deviation $\hat{\sigma}_L^{ML}$ can be approximated as
\begin{equation}
 \mathbb{V}ar\left[ \hat{\sigma}_L^{ML} \right] \approx \frac{1}{4 \hat{Q}_{L,2}} \mathbb{V}ar\left[ \hat{Q}_{L,2} \right].
\end{equation}

Similarly to the variance case, a numerical optimization problem can be solved to obtain the sample allocation for the estimator of the standard deviation given a prescribed accuracy target. 


\subsection{MLMC extension to the scalarization function}
Often, especially in the context of optimization, it is necessary to estimate statistics of a metric defined as a linear combination of mean and standard deviation of a QoI. A classical reliability measure $c^{ML}[Q]$ can be defined, for the quantity $Q$, starting from multilevel (ML) statistics, as
\begin{equation}
 c_L^{ML}[Q] = \hat{Q}_{L}^{ML}  + \alpha \hat{\sigma}_L^{ML}.
\end{equation}

To obtain the sample allocation, in the MLMC context, it is necessary to evaluate the variance of $c_L^{ML}[Q]$, which can be written as 
\begin{equation}
\mathbb{V}ar\left[ c_L^{ML}[Q] \right] = \mathbb{V}ar\left[ \hat{Q}_{L}^{ML} \right] + \alpha^2 \mathbb{V}ar\left[ \hat{\sigma}_L^{ML} \right] 
                                        + 2 \alpha \mathbb{C}ov\left[ \hat{Q}_{L}^{ML}, \hat{\sigma}_L^{ML} \right].
\end{equation}
This expression requires, in addition to the already available terms $\mathbb{V}ar\left[ \hat{Q}_{L}^{ML} \right]$ and $\mathbb{V}ar\left[ \hat{\sigma}_L^{ML} \right]$, also the covariance term $\mathbb{C}ov\left[ \hat{Q}_{L}^{ML}, \hat{\sigma}_L^{ML} \right]$. This latter term can be written knowing that shared samples are only present on the same level
\begin{equation}
\begin{split}
 \mathbb{C}ov\left[ \hat{Q}_{L}^{ML}, \hat{\sigma}_L^{ML} \right] &= \mathbb{C}ov\left[ \sum_{\ell=0}^{L} \hat{Q}_{\ell} - \hat{Q}_{\ell-1}, \sum_{\ell=0}^{L} \hat{\sigma}_{\ell} - \hat{\sigma}_{\ell-1} \right] \\
                                                                  &= \sum_{\ell=0}^{L} \mathbb{C}ov\left[ \hat{Q}_{\ell} - \hat{Q}_{\ell-1}, \hat{\sigma}_{\ell} - \hat{\sigma}_{\ell-1} \right],
\end{split}
\end{equation}
which leads to the need for evaluating the following four contributions 
\begin{equation}
 \mathbb{C}ov\left[ \hat{Q}_{\ell} - \hat{Q}_{\ell-1}, \hat{\sigma}_{\ell} - \hat{\sigma}_{\ell-1} \right] =
 \mathbb{C}ov\left[ \hat{Q}_{\ell} , \hat{\sigma}_{\ell} \right] - \mathbb{C}ov\left[ \hat{Q}_{\ell} , \hat{\sigma}_{\ell-1} \right]
 - \mathbb{C}ov\left[ \hat{Q}_{\ell-1}, \hat{\sigma}_{\ell} \right] + \mathbb{C}ov\left[ \hat{Q}_{\ell-1}, \hat{\sigma}_{\ell-1} \right].
\end{equation}

In Dakota, we adopt the following approximation, for two arbitrary levels $\ell$ and $\kappa \in \left\{ \ell-1, \ell, \ell+1 \right\}$ 
\begin{equation}
 \rho\left[ \hat{Q}_{\ell}, \hat{\sigma}_{\kappa} \right] \approx \rho\left[ \hat{Q}_{\ell}, \hat{Q}_{\kappa,2} \right]
 \footnote{We indicate with $\hat{Q}_{\kappa,2}$ the second central moment for $Q$ at the level $\kappa$.},
\end{equation}
which corresponds to assuming that the correlation between expected value and variance is a good approximation of the correlation between the expected value and the standard deviation. This assumption is particularly convenient because it is possible to obtain in closed form the covariance between expected value and variance and, therefore, we can adopt the following approximation
\begin{equation}
\begin{split}
 \frac{ \mathbb{C}ov\left[ \hat{Q}_{\ell}, \hat{\sigma}_{\kappa} \right]}{\sqrt{ \mathbb{V}ar\left[ \hat{Q}_{\ell} \right] \mathbb{V}ar\left[ \hat{\sigma}_{\kappa} \right]} } 
 \approx \frac{\mathbb{C}ov\left[ \hat{Q}_{\ell}, \hat{Q}_{\kappa,2} \right]}{\sqrt{ \mathbb{V}ar\left[ \hat{Q}_{\ell}\right] \mathbb{V}ar\left[ \hat{Q}_{\kappa,2}\right] }} \\
 %
 \mathbb{C}ov\left[ \hat{Q}_{\ell}, \hat{\sigma}_{\kappa} \right] 
 \approx \mathbb{C}ov\left[ \hat{Q}_{\ell}, \hat{Q}_{\kappa,2} \right] \frac{\sqrt{\mathbb{V}ar\left[ \hat{\sigma}_{\kappa} \right]}}{\sqrt{  \mathbb{V}ar\left[ \hat{Q}_{\kappa,2}\right] }}.
\end{split}
\end{equation}

Finally, we can derive the term $\mathbb{C}ov\left[ \hat{Q}_{\ell}, \hat{Q}_{\kappa,2} \right]$ for all possible cases
\begin{equation}
 \mathbb{C}ov\left[ \hat{Q}_{\ell}, \hat{Q}_{\kappa,2} \right] = 
 \begin{cases}
    \frac{1}{N_\ell} \left( \mathbb{E}\left[ Q_\ell Q_{\kappa}^2 \right] 
                          - \mathbb{E}\left[ Q_\ell \right] \mathbb{E}\left[ Q_{\kappa}^2 \right] 
                          - 2 \mathbb{E}\left[ Q_{\kappa} \right] \mathbb{E}\left[ Q_\ell Q_{\kappa} \right]
                          + 2 \mathbb{E}\left[ Q_\ell \right] \mathbb{E}\left[ Q_\kappa^2 \right]
                          \right),& \text{if } \kappa \neq \ell \\
    \frac{\hat{Q}_{\ell,3}}{N_\ell},              & \text{if }  \kappa = \ell.
\end{cases}
\end{equation}


% In this case, in order to obtain the variance of $c^{ML}[Q]$ it is necessary to employ an additional approximation:
% \begin{equation}
% \begin{split}
%  \mathbb{V}ar\left[ c^{ML}[Q] \right] &= \mathbb{V}ar\left[ \hat{Q}_{L}^{ML} \right] + \alpha^2 \mathbb{V}ar\left[ \hat{\sigma}_L^{ML} \right] 
%                                       + 2 \alpha \mathbb{C}ov\left[ \hat{Q}_{L}^{ML}, \hat{\sigma}_L^{ML} \right] \\
%                                       &= \mathbb{V}ar\left[ \hat{Q}_{L}^{ML} \right] + \alpha^2 \mathbb{V}ar\left[ \hat{\sigma}_L^{ML} \right] 
%                                       + 2 \alpha \rho\left[\hat{Q},\hat{\sigma}\right] \sqrt{ \mathbb{V}ar\left[ \hat{Q}_{L}^{ML} \right] }  \sqrt{ \mathbb{V}ar\left[ \hat{\sigma}_L^{ML} \right] } \\
%                                       &\leq \mathbb{V}ar\left[ \hat{Q}_{L}^{ML} \right] + \alpha^2 \mathbb{V}ar\left[ \hat{\sigma}_L^{ML} \right] 
%                                       + 2 |\alpha| \sqrt{ \mathbb{V}ar\left[ \hat{Q}_{L}^{ML} \right] }  \sqrt{ \mathbb{V}ar\left[ \hat{\sigma}_L^{ML} \right] },
% \end{split}
% \end{equation}
% 
% which permits to bound the maximum value for the variance (assuming a very conservative approximation for the correlation between the estimators for the mean and the standard deviation, \textit{i.e.} $\left|\rho\left[\hat{Q},\hat{\sigma}\right]\right|=1$).

% All terms in the previous expression can be written as a function of the quantities derived in the previous sections, and, therefore, even for this case the allocation problem can be solved by resorting to a numerical optimization given a prescribed target.
Even for this case, the sample allocation problem can be solved by resorting to a numerical optimization given a prescribed target.

\section{A multilevel-multifidelity approach} \label{uq:sampling:mlmf}

The MLMC approach described in \S\ref{uq:sampling:multilevel} can be considered
to be a recursive control variate technique in that it seeks to reduce
the variance of the target function in order to limit the sampling at
high resolution. In addition, the difference function $Y_\ell$ for
each level can itself be the target of an additional control variate
(refer to \S\ref{uq:sampling:controlvariate}). A practical scenario is
when not only different resolution levels are available (multilevel
part), but also a cheaper computational model can be used
(multifidelity part). The combined approach is a
multilevel-multifidelity algorithm, and in particular, a
multilevel-control variate Monte Carlo sampling approach.

\subsection{$Y_l$ correlations} \label{uq:sampling:mlmf:Ycorr}

If the target QoI can be generated from both a high-fidelity (HF) model 
and a cheaper, possibly biased low-fidelity (LF) model, it is possible 
to write the following estimator
\begin{equation}\label{EQ: MLMF estimator}
 \mathbb{E}\left[Q_M^{\mathrm{HF}}\right] = \sum_{l=0}^{L_{\mathrm{HF}}} \mathbb{E}\left[Y^{\mathrm{HF}}_{\ell}\right] 
                                          \approx \sum_{l=0}^{L_{\mathrm{HF}}} \hat{Y}^{\mathrm{HF}}_{\ell} = \sum_{l=0}^{L_{\mathrm{HF}}} Y^{{\mathrm{HF}},\star}_{\ell},
\end{equation}
where
\begin{equation}
 Y^{{\mathrm{HF}},\star}_{\ell} = Y^{\mathrm{HF}}_{\ell} + \alpha_\ell \left( \hat{Y}^{\mathrm{LF}}_{\ell} - \mathbb{E}\left[{Y^{\mathrm{LF}}_{\ell}}\right] \right).
\end{equation}

The estimator $Y^{\mathrm{HF},\star}_{\ell}$ is unbiased with respect to $\hat{Y}^{\mathrm{HF}}_{\ell}$, hence with 
respect to the true value $\mathbb{E}\left[Y^{\mathrm{HF}}_{\ell}\right]$.
The control variate is obtained by means of the LF model realizations for which the expected value can be 
computed in two different ways: $\hat{Y}^{\mathrm{LF}}_{\ell}$ and $\mathbb{E}\left[Y^{\mathrm{LF}}_{\ell}\right]$. 
A MC estimator is employed for each term but the estimation of $\mathbb{E}\left[Y^{\mathrm{LF}}_{\ell}\right]$ is more resolved than $\hat{Y}^{\mathrm{LF}}_{\ell}$. For $\hat{Y}^{\mathrm{LF}}_{\ell}$, we choose the number of 
LF realizations to be equal to the number of HF realizations, $N_{\ell}^{\mathrm{HF}}$.  For the more resolved $\mathbb{E}\left[Y^{\mathrm{LF}}_{\ell}\right]$, we
augment with an additional and independent set of realizations $\Delta_{\ell}^{\mathrm{LF}}$, 
hence $N_{\ell}^{\mathrm{LF}} = N_{\ell}^{\mathrm{HF}} + \Delta_{\ell}^{\mathrm{LF}}$. The set $\Delta_{\ell}^{\mathrm{LF}}$ is
written, for convenience, as proportional to $N_{\ell}^{\mathrm{HF}}$ by means of a parameter $r_{\ell} \in \mathbb{R}^+_0$
\begin{equation}
 N_{\ell}^{\mathrm{LF}} = N_{\ell}^{\mathrm{HF}} + \Delta_{\ell}^{\mathrm{LF}} = N_{\ell}^{\mathrm{HF}} + r_{\ell} N_{\ell}^{\mathrm{HF}} 
                        = N_{\ell}^{\mathrm{HF}} (1 + r_{\ell}).
\end{equation}
The set of samples $\Delta_{\ell}^{\mathrm{LF}}$ is independent of $N_{\ell}^{\mathrm{HF}}$, therefore the variance of the 
estimator can be written as (for further details see \cite{GeraciCTR})
\begin{equation}\label{EQ: MLMF mean}
\begin{split}
\mathbb{V}ar\left(\hat{Q}_M^{MLMF}\right) &= \sum_{l=0}^{L_{\mathrm{HF}}} \left( \dfrac{1}{N_{\ell}^{\mathrm{HF}}} \mathbb{V}ar\left(Y^{\mathrm{HF}}_{\ell}\right) 
                                          + \dfrac{\alpha_\ell^2 r_\ell}{(1+r_\ell) N_{\ell}^{\mathrm{HF}}} \mathbb{V}ar\left(Y^{\mathrm{HF}}_{\ell}\right) \right. \\
              &+  \left. 2 \dfrac{\alpha_\ell r_\ell^2}{(1+r_\ell) N_{\ell}^{\mathrm{HF}}} \rho_\ell \sqrt{ \mathbb{V}ar\left(Y^{\mathrm{HF}}_{\ell}\right) 
                                                                                                      \mathbb{V}ar\left(Y^{\mathrm{LF}}_{\ell}\right) } \right),
\end{split}
\end{equation}

The Pearson's correlation coefficient between the HF and LF models is indicated by $\rho_\ell$ in the previous equations.
Assuming the vector $r_\ell$ as a parameter, the variance is minimized per level, mimicking the standard control variate 
approach, and thus obtaining the optimal coefficient as 
$\alpha_\ell = -\rho_\ell \sqrt{ \dfrac{ \mathbb{V}ar\left( Y^{\mathrm{HF}}_{\ell} \right) } 
                                { \mathbb{V}ar\left( Y^{\mathrm{LF}}_{\ell}  \right)     }}$. 
By making use of the optimal coefficient $\alpha_\ell$, it is possible to show that the variance $\mathbb{V}ar\left(Y^{\mathrm{HF},\star}_{\ell}\right)$ 
is proportional to the variance $\mathbb{V}ar\left(Y^{\mathrm{HF}}_{\ell}\right)$ through a factor $\Lambda_{\ell}(r_\ell)$, which is an explicit 
function of the ratio $r_\ell$:
\begin{equation}\label{EQ: MLMF variance}
\begin{split}
 \mathbb{V}ar\left(\hat{Q}_M^{MLMF}\right) &= \sum_{l=0}^{L_{\mathrm{HF}}} \dfrac{1}{N_{\ell}^{\mathrm{HF}}} \mathbb{V}ar\left(Y^{\mathrm{HF}}_{\ell}\right)
 \Lambda_{\ell}(r_\ell) \quad \mathrm{where} \\
 \Lambda_{\ell}(r_\ell) &= \left( 1 - \dfrac{r_\ell}{1+r_\ell}\rho_\ell^2 \right).
\end{split}
\end{equation}
Note that $\Lambda_{\ell}(r_\ell)$ represents a penalty with respect to the classical 
control variate approach presented in \S\ref{uq:sampling:controlvariate}, which stems from 
the need to evaluate the unknown function $\mathbb{E}\left[Y^{\mathrm{LF}}_{\ell}\right]$. However, the ratio $r_\ell/(r_\ell+1)$ is dependent 
on the additional number of LF evaluations $\Delta_{\ell}^{\mathrm{LF}}$, hence it is fair to assume that it 
can be made very close to unity by choosing an affordably large $r_\ell$, i.e., $\Delta_{\ell}^{\mathrm{LF}} >> N_{\ell}^{\mathrm{HF}}$.

The optimal sample allocation is determined taking into account the relative cost between the HF and LF models and their correlation (per level).
In particular the optimization problem introduced in Eq.~\eqref{EQ:mlmc_optimization} is replaced by
 \begin{equation*}
  \operatornamewithlimits{argmin}\limits_{N_{\ell}^{\mathrm{HF}}, r_\ell}(\mathcal{L}), \quad \mathrm{where} \quad \mathcal{L} = \sum_{\ell=0}^{L_{\mathrm{HF}}} N_{\ell}^{\mathrm{HF}} \mathcal{C}_{\ell}^{\mathrm{eq}} +
                 \lambda \left( \sum_{\ell=0}^{L_{\mathrm{HF}}} \dfrac{1}{N_{\ell}^{\mathrm{HF}}}\mathbb{V}ar\left( Y^{\mathrm{HF}}_{\ell}\right) \Lambda_{\ell}(r_\ell) - \varepsilon^2/2 \right),
 \end{equation*}
where the optimal allocation is obtained as well as the optimal ratio $r_\ell$. The cost per level includes now the sum of the HF and LF realization cost,
therefore it can be expressed as $\mathcal{C}_{\ell}^{\mathrm{eq}} = \mathcal{C}_{\ell}^{\mathrm{HF}} + \mathcal{C}_{\ell}^{\mathrm{LF}} (1+r_\ell)$.

If the cost ratio between the HF and LF model is $w_{\ell} =  \mathcal{C}_{\ell}^{\mathrm{HF}} / \mathcal{C}_{\ell}^{\mathrm{LF}}$ then the optimal ratio
is 
\begin{equation}
 r_\ell^{\star} = -1 + \sqrt{ \dfrac{\rho_\ell^2}{1-\rho_\ell^2} w_{\ell}},
\end{equation}
and
the optimal allocation is
\begin{equation}
 \begin{split}
  N_{\ell}^{\mathrm{HF},\star} &= \frac{2}{\varepsilon^2} \!\! \left[ \, \sum_{k=0}^{L_{\mathrm{HF}}} 
        \left( \dfrac{ \mathbb{V}ar\left(  Y_k^{ \mathrm{HF} } \right) \mathcal{C}_{k}^{\mathrm{HF}}}{1-\rho_\ell^2} \right)^{1/2} \Lambda_{k}(r_k^{\star}) \right] 
               \sqrt{ \left( 1 - \rho_\ell^2 \right) \frac{ \mathbb{V}ar\left(Y^{\mathrm{HF}}_{\ell}\right) }{\mathcal{C}_{\ell}^{\mathrm{HF}}}}.
\end{split}
\end{equation}
               
               
It is clear that the efficiency of the algorithm is related not only to the efficiency of the LF model, i.e. how fast a simulation runs with respect
to the HF model, but also to the correlation between the LF and HF model.

\subsection{$Q_l$ correlations} \label{uq:sampling:mlmf:Ycorr}

A potential refinement of the previous approach consists in exploiting the QoI on each pair of levels, $\ell$ and $\ell-1$, to 
build a more correlated LF function. For instance, it is possible to use
\begin{equation}
 \mathring{Y}^{\mathrm{LF}}_{\ell} =  \gamma_\ell Q_\ell^{\mathrm{LF}} - Q_{\ell-1}^{\mathrm{LF}}
\end{equation}
and maximize the correlation between $Y_\ell^{\mathrm{HF}}$ and $\mathring{Y}^{\mathrm{LF}}_{\ell}$ through the coefficient $\gamma_\ell$.

Formally the two formulations are completely equivalent if $Y_\ell^{\mathrm{LF}}$ is replaced with $\mathring{Y}^{\mathrm{LF}}_{\ell}$ in 
Equation~\eqref{EQ: MLMF estimator} and they can be linked through the two ratios
  \begin{equation}
 \begin{split}
 \theta_{\ell} &= \dfrac{  \mathrm{Cov}\left(  Y^{\mathrm{HF}}_{\ell},\mathring{Y}^{\mathrm{LF}}_{\ell} \right)   }
                        {  \mathrm{Cov}\left( Y^{\mathrm{HF}}_{\ell},Y^{\mathrm{LF}}_{\ell} \right)  } \\
 \quad \tau_{\ell}  &= \dfrac{  \mathbb{V}ar\left(  \mathring{Y}^{\mathrm{LF}}_{\ell} \right)  }{ \mathbb{V}ar\left( Y^{\mathrm{LF}}_{\ell} \right) },
 \end{split}
\end{equation}
obtaining the following variance for the estimator
\begin{equation}
 \mathbb{V}ar\left(\hat{Q}_M^{MLMF} \right) = \dfrac{1}{N_{\ell}^{\mathrm{HF}}} \mathbb{V}ar\left( Y^{\mathrm{HF}}_{\ell} \right) 
 \left( 1 - \dfrac{r_\ell}{1+r_\ell} \rho_\ell^2 \dfrac{\theta_\ell^2}{\tau_\ell} \right).
\end{equation}

Therefore, a way to increase the variance reduction is to maximize the ratio $\dfrac{\theta_\ell^2}{\tau_\ell}$ with respect to the 
parameter $\gamma_\ell$. It is possible to solve analytically this maximization problem obtaining 
\begin{equation}
\gamma_\ell^\star= \dfrac{ \mathrm{Cov}\left(  Y^{\mathrm{HF}}_{\ell},Q_{\ell-1}^{\mathrm{LF}} \right) \mathrm{Cov}\left( Q_{\ell}^{\mathrm{LF}},Q_{\ell-1}^{\mathrm{LF}} \right) 
                   - \mathbb{V}ar\left(Q_{\ell-1}^{\mathrm{LF}}\right) \mathrm{Cov}\left(  Y^{\mathrm{HF}}_{\ell},Q_{\ell}^{\mathrm{LF}} \right) }
            { \mathbb{V}ar\left(Q_{\ell}^{\mathrm{LF}}\right) \mathrm{Cov}\left( Y^{\mathrm{HF}}_{\ell},Q_{\ell-1}^{\mathrm{LF}} \right) 
            - \mathrm{Cov}\left( Y^{\mathrm{HF}}_{\ell},Q_{\ell}^{\mathrm{LF}} \right) \mathrm{Cov}\left( Q_{\ell}^{\mathrm{LF}},Q_{\ell-1}^{\mathrm{LF}} \right) }.
\end{equation}
%to which correspond the optimal ratio $\dfrac{\theta_\ell^2}{\tau_\ell} = \dfrac{\theta_\ell^2}{\tau_\ell} (\gamma_\ell^\star )$.

The resulting optimal allocation of samples across levels and model forms is given by
 \begin{equation*}
 \begin{split}
  r_\ell^{\star} &= -1 + \sqrt{ \dfrac{\rho_l^2 \dfrac{\theta_\ell^2}{\tau_\ell} }{1-\rho_\ell^2 \dfrac{\theta_\ell^2}{\tau_\ell}} w_{\ell}}, \quad \mathrm{where} \quad w_{\ell} 
               =  \mathcal{C}_{\ell}^{\mathrm{HF}} / \mathcal{C}_{\ell}^{\mathrm{LF}}\\
  \Lambda_{\ell} &= 1 - \rho_\ell^2 \dfrac{\theta_\ell^2}{\tau_\ell} \dfrac{r_\ell^{\star}}{1+r_\ell^{\star}}\\
  N_{\ell}^{\mathrm{HF},\star} &= \frac{2}{\varepsilon^2} \!\! \left[ \, \sum_{k=0}^{ L_{\mathrm{HF}} } 
       \left( \dfrac{ \mathbb{V}ar\left(Y_k^{ \mathrm{HF} } \right) \mathcal{C}_{k}^{\mathrm{HF}}}{1-\rho_\ell^2 \dfrac{\theta_\ell^2}{\tau_\ell}} \right)^{1/2} \Lambda_{k}(r_k^{\star})\right] 
               \sqrt{ \left( 1 - \rho_\ell^2 \dfrac{\theta_\ell^2}{\tau_\ell} \right) \frac{ \mathbb{V}ar\left( Y^{\mathrm{HF}}_{\ell} \right) }{\mathcal{C}_{\ell}^{\mathrm{HF}}}}
 \end{split}
\end{equation*}
